A conceptual framework and empirical research for classifying visual descriptors
نویسندگان
چکیده
This paper presents exploratory research evaluating a conceptual structure for the description of visual content of images. The structure, which was developed from empirical research in several fields (e.g., Computer Science, Psychology, Information Studies, etc.), classifies visual attributes into a Pyramid containing four syntactic levels (type/technique, global distribution, local structure, composition), and six semantic levels (generic, specific, and abstract object and scene, respectively). Various experiments are presented, which address the Pyramid's ability to achieve several tasks: (1) classification of terms describing image attributes generated in a formal and an informal description task, (2) classification of terms that result from a structured approach to indexing, (3) guidance in the indexing process. Several descriptions, generated by naïve users and indexers are used in experiments that include two image collections: a random web sample, and a set of news images. To test descriptions generated in a structured setting, an Image Indexing Template (developed independently of this project by one of the authors over several years) was also used. The experiments performed suggest that the Pyramid is conceptually robust (i.e., can accommodate a full range of attributes) and that it can be used to organize visual content for retrieval, to guide the indexing process, and to classify descriptions obtained manually and automatically. Introduction Technologies for the digitization of analog image collections and the digital production of new images are combining to create vast digital image libraries. The demand for networked access and sharing of these images has created a need for new and more efficient techniques to index their content. Access to these image collections through traditional indexing techniques is problematic for a number of reasons, as existing indexing systems have been created for the needs of limited audiences or targeted for particular types of collections. Newer content1 Two of the more widely used in the United States are the Thesaurus for Graphic Materials I (TGM I, Library of Congress, 2000) and the Art and Architecture Thesaurus (AAT, Getty Research Institute, 2000). The TGM, although created as a tool for indexing broad, general image collections, was developed to meet the needs of the Prints and Photographs Division and is more appropriate to collections of a historical nature. The AAT is a precise indexing tool which meets the needs of specialized communities of researchers and provides access at a high level of specificity. A review of image indexing systems is provided in Jörgensen, 2000 and Rasmussen, 1997. 2 based techniques have utility in retrieving subsets of specific visual attributes and may be useful as tools for segmenting large image collections, but currently address only a small portion of the complete range of image attributes of potential interest to users of digital image collections. More recently, there has been an interest among computer scientists in combining these techniques with other more traditional indexing techniques, such as the use of broad ontologies (Chang et al., 1997). Other recent initiatives focus on metadata structures for image information. Two sets of proposed attributes have been widely disseminated: The Dublin Core (Dublin Core, 2000) and the VRA Core (VRA Data Standards Committee, 2000). The Art Information Task Force has also proposed the Categories for the Description of Works of Art (Getty Information Institute, 1999). Another group addressing metadata standards is the Motion Pictures Experts Group (MPEG). Their latest initiative, known as MPEG-7 3 is developing standards for the description of multimedia content, which may include any combination of still images, moving images, audio, and text. The research reported herein was completed as part of the MPEG-7 initiative. The goal of the current research was to test a particular structured representation for the classification of a wide range of image attributes of interest in a retrieval context. The research evaluated and compared the structure in relation to image descriptions resulting from two very different methodologies, conceptual modeling (or a “topdown”) approach, and a data-driven (or “bottom-up”) approach, Related Research Work on issues related to images has been performed by researchers in many different areas. Selective examples follow. Studies in art have focused on interpretation and perception (Arnheim, 1984; Buswell, 1935) aesthetics and formal analysis (Barnet, 1997), visual communication (Dondis, 1973), and levels of meaning (Panofsky, 1962). Studies in cognitive psychology have dealt with issues such as perception (Hendee & Wells, 1997); visual similarity (Tversky, 1977), mental categories (i.e., concepts) (Armstrong et al., 1983), distinctions between perceptual and conceptual category structure (Burns, 1992; Harnad, 1987), and internal category structure (i.e., levels of categorization) (Morris & Murphy, 1990; Rosch & Mervis, 1975). In the field of library and information science (LIS), work has been performed on analyzing the subject of an image (Shatford Layne, 1986; Turner, 1994), indexing (Fidel et al., 1994; Shatford Layne, 1994), the range of attributes that are used to describe images 2 The phrase “content-based retrieval” comes from the Electrical Engineering/Computer Science community. “Contentbased” refers to retrieval of visual information based on what is depicted (color, texture, objects, etc.). 3 The goal of MPEG-7 is to specify a standard set of descriptors (Ds) for content representation for multimedia information search (indexing and retrieval or “pull” applications), selection and filtering (“push” applications), and management and processing. See http://www.cselt.it/mpeg/standards/mpeg-7/mpeg-7.htm for more information. 3 (Jörgensen, 1998), classification (Lohse et al., 1994), query analysis (Enser, 1993) and indexing schemes (Davis, 1997; Jörgensen, 1996b), among others. The Conceptual Model The conceptual model (the “Pyramid”) was developed drawing upon this previous body of research. The structure of the Pyramid is briefly outlined below; examples, justification, and further details for each level can be found in Jaimes & Chang, 2000. The Pyramid (Figure 1) contains ten levels: the first four refer to syntax, and the remaining six refer to semantics. In addition, levels one to four are directly related to percept, and levels five through ten to visual concept. While some of these divisions may not be strict, they should be considered because they have a direct impact in understanding what the user is searching for and how s/he tries to find it. The levels also emphasize the limitations of different indexing techniques (manual and automatic) in terms of the knowledge required. The research on visual information that has been carried out in different fields shows that indexing such information can be particularly complex for several reasons. First, visual content carries information at many different levels (e.g., syntactic: the colors in the image; semantic: the objects in the image). Second, descriptions of visual content can be highly subjective, varying both across indexers and users, and for a single user over time. Such descriptions depend on other factors that include, for example, the indexer’s knowledge (e.g., art historian), purpose of the database (e.g., education), database content (e.g., fine art images; commercial images), and the task of the user (find a specific image or a “meaningful” image). Three main factors entered into the construction of the proposed model: (1) range of descriptions; (2) related research in various fields; and (3) generality. In considering the range of descriptions, the focus was only on visual content (i.e., any descriptors stimulated by the visual content of the image or video in question; the price of a painting would not be part of visual content). Since such content can be described in terms of syntax or semantics the structure contains a division that groups descriptors based on those two categories. This division is of paramount importance, particularly when we observe research in different fields. Most of the work on contentbased retrieval, for example, supports syntactic-level indexing, while work in art places strong emphasis on composition (i.e., relationships between elements) both at the syntactic (i.e.., how colors, lines, and patterns are laid out) and semantic levels (i.e., the meaning of objects and their interactions). Most of the work in information science, on the other hand, focuses on semantics. The structure was developed based on research and existing systems in different fields.
منابع مشابه
Developing a Conceptual Framework of Integrity in Urban Heritage Conservation
The concept of integrity, as a factor of sustaining values and significance of cultural heritage, is considered to be a key element in the process of urban heritage conservation. Review and analysis of documents, conventions and theories concerning the role of integrity in urban heritage conservation shows that in recent decades, the concept of integrity has attracted attention worldwide in the...
متن کاملDeveloping a Conceptual Framework for Evaluation of Elimination of Visual Pollution Plans, Case of Study: Enghelab Street, Tehran
Visual pollution is one of the problems of urban appearance that great negative influence on the quality of life of citizens. Some elements such as billboards and advertisements on building facades, canals and shafts for air conditioning, electricity and telephone cables, and gas pipes, façade deterioration, heterogeneity of new constructions, disruption of the historical skyline, extensions to...
متن کاملDesigning a Conceptual Framework for Integrating Components of Professional Ethics in a Ceramic Curriculum
Background: Teaching professional ethics in the ceramics branch requires using a standard system of integrating professional ethics components in the ceramics curriculum elements to determine the relationship between professional ethics and the curriculum components. The aim of the present study is a conceptual framework for integrating the elements of professional ethics in the ceramic’s curri...
متن کاملNurse Job Satisfaction: Is a Revised Conceptual Framework Needed?
Background and Objectives: Job satisfaction is a critical factor in attracting and retaining nurses. Although many studies have dealt with nurses’ job satisfaction, rapid transformation of the community and health systems can alter the factors influencing this issue, hence calling for continuous monitoring of job satisfaction as perceived by nurses. Built on this necessity, the present study wa...
متن کاملBuilding a Comprehensive Conceptual Framework for Power Systems Resilience Metrics
Recently, the frequency and severity of natural and man-made disasters (extreme events), which have a high-impact low-frequency (HILF) property, are increased. These disasters can lead to extensive outages, damages, and costs in electric power systems. A power system must be built with “resilience” against disasters, which means its ability to withstand disasters efficiently while ensuring the ...
متن کاملDevelop a conceptual framework for social commerce in the sports industry
The purpose of this study was to designing a social business framework in the sports industry. The research method was qualitative with a systematic exploratory approach (content analysis). Selected. Statistical population of the research included two sections of human resources (managers and consultants of sports businesses, professors and experts in sports marketing and social media) and info...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JASIST
دوره 52 شماره
صفحات -
تاریخ انتشار 2001